relu budget
CryptoNAS: Private Inference on a ReLU Budget
Machine learning as a service has given raise to privacy concerns surrounding clients' data and providers' models and has catalyzed research in private inference (PI): methods to process inferences without disclosing inputs. Recently, researchers have adapted cryptographic techniques to show PI is possible, however all solutions increase inference latency beyond practical limits. This paper makes the observation that existing models are ill-suited for PI and proposes a novel NAS method, named CryptoNAS, for finding and tailoring models to the needs of PI. The key insight is that in PI operator latency cost are inverted: non-linear operations (e.g., ReLU) dominate latency, while linear layers become effectively free. We develop the idea of a ReLU budget as a proxy for inference latency and use CryptoNAS to build models that maximize accuracy within a given budget. CryptoNAS improves accuracy by 3.4% and latency by 2.4x over the state-of-the-art.
Coordinate Descent for Network Linearization
Rakhlin, Vlad, Jevnisek, Amir, Avidan, Shai
ReLU activations are the main bottleneck in Private Inference that is based on ResNet networks. This is because they incur significant inference latency. Reducing ReLU count is a discrete optimization problem, and there are two common ways to approach it. Most current state-of-the-art methods are based on a smooth approximation that jointly optimizes network accuracy and ReLU budget at once. However, the last hard thresholding step of the optimization usually introduces a large performance loss. We take an alternative approach that works directly in the discrete domain by leveraging Coordinate Descent as our optimization framework. In contrast to previous methods, this yields a sparse solution by design. We demonstrate, through extensive experiments, that our method is State of the Art on common benchmarks.
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Africa > Rwanda > Kigali > Kigali (0.04)
Review for NeurIPS paper: CryptoNAS: Private Inference on a ReLU Budget
The authors argue that when using MiniONN, multiplication and addition are nearly free while ReLU operations are expensive; this is very different from inference on non-encrypted data, where multiply-adds tend to dominate the total runtime. They propose a combination of manual network modifications and Neural Architecture Search to find network architectures which have good tradeoffs between accuracy and number of ReLUs. The techniques are: 1) "ReLU Shuffling:" Manually changing the positions of certain ReLU layers so that ReLUs are applied to layers with fewer channels.
CryptoNAS: Private Inference on a ReLU Budget
Machine learning as a service has given raise to privacy concerns surrounding clients' data and providers' models and has catalyzed research in private inference (PI): methods to process inferences without disclosing inputs. Recently, researchers have adapted cryptographic techniques to show PI is possible, however all solutions increase inference latency beyond practical limits. This paper makes the observation that existing models are ill-suited for PI and proposes a novel NAS method, named CryptoNAS, for finding and tailoring models to the needs of PI. The key insight is that in PI operator latency cost are inverted: non-linear operations (e.g., ReLU) dominate latency, while linear layers become effectively free. We develop the idea of a ReLU budget as a proxy for inference latency and use CryptoNAS to build models that maximize accuracy within a given budget.
Linearizing Models for Efficient yet Robust Private Inference
Sarkar, Sreetama, Kundu, Souvik, Beerel, Peter A.
The growing concern about data privacy has led to the development of private inference (PI) frameworks in client-server applications which protects both data privacy and model IP. However, the cryptographic primitives required yield significant latency overhead which limits its wide-spread application. At the same time, changing environments demand the PI service to be robust against various naturally occurring and gradient-based perturbations. Despite several works focused on the development of latency-efficient models suitable for PI, the impact of these models on robustness has remained unexplored. Towards this goal, this paper presents RLNet, a class of robust linearized networks that can yield latency improvement via reduction of high-latency ReLU operations while improving the model performance on both clean and corrupted images. In particular, RLNet models provide a "triple win ticket" of improved classification accuracy on clean, naturally perturbed, and gradient-based perturbed images using a shared-mask shared-weight architecture with over an order of magnitude fewer ReLUs than baseline models. To demonstrate the efficacy of RLNet, we perform extensive experiments with ResNet and WRN model variants on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets. Our experimental evaluations show that RLNet can yield models with up to 11.14x fewer ReLUs, with accuracy close to the all-ReLU models, on clean, naturally perturbed, and gradient-based perturbed images. Compared with the SoTA non-robust linearized models at similar ReLU budgets, RLNet achieves an improvement in adversarial accuracy of up to ~47%, naturally perturbed accuracy up to ~16.4%, while improving clean image accuracy up to ~1.5%.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > Middle East > Jordan (0.04)
AutoReP: Automatic ReLU Replacement for Fast Private Network Inference
Peng, Hongwu, Huang, Shaoyi, Zhou, Tong, Luo, Yukui, Wang, Chenghong, Wang, Zigeng, Zhao, Jiahui, Xie, Xi, Li, Ang, Geng, Tony, Mahmood, Kaleel, Wen, Wujie, Xu, Xiaolin, Ding, Caiwen
The growth of the Machine-Learning-As-A-Service (MLaaS) market has highlighted clients' data privacy and security issues. Private inference (PI) techniques using cryptographic primitives offer a solution but often have high computation and communication costs, particularly with non-linear operators like ReLU. Many attempts to reduce ReLU operations exist, but they may need heuristic threshold selection or cause substantial accuracy loss. This work introduces AutoReP, a gradient-based approach to lessen non-linear operators and alleviate these issues. It automates the selection of ReLU and polynomial functions to speed up PI applications and introduces distribution-aware polynomial approximation (DaPa) to maintain model expressivity while accurately approximating ReLUs. Our experimental results demonstrate significant accuracy improvements of 6.12% (94.31%, 12.9K ReLU budget, CIFAR-10), 8.39% (74.92%, 12.9K ReLU budget, CIFAR-100), and 9.45% (63.69%, 55K ReLU budget, Tiny-ImageNet) over current state-of-the-art methods, e.g., SNL. Morever, AutoReP is applied to EfficientNet-B2 on ImageNet dataset, and achieved 75.55% accuracy with 176.1 times ReLU budget reduction.
- Asia (0.04)
- North America > United States > North Carolina (0.04)
- North America > United States > Connecticut (0.04)
- (3 more...)
Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference
Kundu, Souvik, Zhang, Yuke, Chen, Dake, Beerel, Peter A.
Large number of ReLU and MAC operations of Deep neural networks make them ill-suited for latency and compute-efficient private inference. In this paper, we present a model optimization method that allows a model to learn to be shallow. In particular, we leverage the ReLU sensitivity of a convolutional block to remove a ReLU layer and merge its succeeding and preceding convolution layers to a shallow block. Unlike existing ReLU reduction methods, our joint reduction method can yield models with improved reduction of both ReLUs and linear operations by up to 1.73x and 1.47x, respectively, evaluated with ResNet18 on CIFAR-100 without any significant accuracy-drop.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Asia (0.04)
Learning to Linearize Deep Neural Networks for Secure and Efficient Private Inference
Kundu, Souvik, Lu, Shunlin, Zhang, Yuke, Liu, Jacqueline, Beerel, Peter A.
The large number of ReLU non-linearity operations in existing deep neural networks makes them ill-suited for latency-efficient private inference (PI). Existing techniques to reduce ReLU operations often involve manual effort and sacrifice significant accuracy. In this paper, we first present a novel measure of non-linearity layers' ReLU sensitivity, enabling mitigation of the time-consuming manual efforts in identifying the same. Based on this sensitivity, we then present SENet, a three-stage training method that for a given ReLU budget, automatically assigns per-layer ReLU counts, decides the ReLU locations for each layer's activation map, and trains a model with significantly fewer ReLUs to potentially yield latency and communication efficient PI. Experimental evaluations with multiple models on various datasets show SENet's superior performance both in terms of reduced ReLUs and improved classification accuracy compared to existing alternatives. In particular, SENet can yield models that require up to 2 fewer Re-LUs while yielding similar accuracy. For a similar ReLU budget SENet can yield models with 2.32% improved classification accuracy, evaluated on CIFAR-100. With the recent proliferation of several AI-driven client-server applications including image analysis (Litjens et al., 2017), object detection, speech recognition (Hinton et al., 2012), and voice assistance services, the demand for machine learning inference as a service (MLaaS) has grown.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe (0.04)
- Asia (0.04)